Practical Code to Calculating Customer Lifetime Value (CLV)
Customer Lifetime Value (CLV) is an estimation of the entire net profit attributed to a single customer. It’s an important metric to understand because it helps businesses determine how much is too much to spend on advertising to acquire a single customer.
In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
pd.set_option('max_columns', 50)
mpl.rcParams['lines.linewidth'] = 2
%matplotlib inline
Data Exploration
For this example we’ll calculate CLV from a dataset of roughly 4,200 transactions.
In [3]:
data = pd.read_csv('/Users/crucker/Desktop/clv_transactions.csv')
data.head(6)
Out[3]:
In [176]:
data.tail(6)
Out[176]:
In [177]:
Transactions = data['CustomerID'].count()
In [178]:
Customers = data['CustomerID'].max()
In [179]:
MinTransactionDate = data['TransactionDate'].min()
In [180]:
MaxTransactionDate = data['TransactionDate'].max()
In [181]:
Amount = data['Amount'].sum()
In [182]:
summary = [Transactions, Customers, MinTransactionDate, MaxTransactionDate, round(Amount, 2)]
summary
Out[182]:
As with any analysis, the first thing we’ll do is look at some basic summary statistics.
In [9]:
data = {'Transactions': [4181],
'Customers': [1000],
'MinTransactionDate': ['2010-01-04'],
'MaxTransactionDate': ['2015-12-31'],
'Amount': [33729.91]}
df = pd.DataFrame(data, index = [''])
df
Out[9]:
In [210]:
TransactionsPerCustomer = round(Transactions / Customers, 2)
TransactionsPerCustomer
Out[210]:
In [211]:
AmountPerTransaction = round(Amount / Transactions, 2)
AmountPerTransaction
Out[211]:
In [212]:
AmountPerCustomer = round(Amount / Customers, 2)
AmountPerCustomer
Out[212]:
Note that the data consists of 1000 customers who made transactions between 2010 and 2015. Furthermore, each customer made about 4 transactions for 8 bucks a piece, totaling close to $34. This amount can be considered a lower bound on CLV since it’s the total amount spent by each customer, but we still expect existing customers to make future purchases.
In [213]:
data = {'TransactionsPerCustomer': [4.0],
'AmountPerTransaction': [8.07],
'AmountPerCustomer': [33.73]}
df = pd.DataFrame(data, index = [''])
df
Out[213]:
In [214]:
more_summary = [TransactionsPerCustomer, AmountPerTransaction, AmountPerCustomer]
more_summary
Out[214]:
We need to consider outlier transactions and should remove the transactions from the data entirely. Here we inspect the largest transactions.
In [4]:
data.loc[data['Amount'] >= 29.99]
Out[4]:
In [6]:
import seaborn as sns
sns.set(color_codes=True)
Plotting Univariate Distributions
We could use a statistical test to check for outliers, but here it’s pretty clear that none exist. Plotting the entire distribution of transaction amounts should give us more confidence in our assertion.
In [8]:
plt.title('Distribution of Transaction Amounts', fontsize=14, fontweight="bold")
sns.distplot(data.Amount, color='#3498db')
Out[8]:
Measuring Historic CLV
Now we need to consider the biggest source of error in our $34 CLV lower bound – some of the underlying customers are brand new and others have been customers for almost five years. Obviously the newer customers will have (generally) spent less on average than the old ones. So, we need to separate the customers into groups based on how long ago they were acquired (e.g. customers acquired in 2010, vs customers acquired in 2011, …).
Since we have 5 years worth of data, let’s separate customers into annual origin periods starting on 2010-01-01, and measure their purchases annually.
In [ ]: